Understanding wages for Registered Nurses.
We’ll use the Tidy Tuesday code to directly load the data from the GitHub repository. We’ll also pass it into janitor::clean_names() to standardize the column names. (Life is too short to have to worry about whitespace and capitalization.)
nurses <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-05/nurses.csv') %>% janitor::clean_names()
We can see there are 22 columns overall. 21 of these are numeric.
skimr::skim(nurses)
| Name | nurses |
| Number of rows | 1242 |
| Number of columns | 22 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 21 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| state | 0 | 1 | 4 | 20 | 0 | 54 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| year | 0 | 1.00 | 2009.00 | 6.64 | 1998.00 | 2.00300e+03 | 2009.00 | 2015.00 | 2020.00 | ▇▆▇▆▇ |
| total_employed_rn | 5 | 1.00 | 47703.88 | 50241.05 | 240.00 | 1.22100e+04 | 31160.00 | 60230.00 | 307060.00 | ▇▂▁▁▁ |
| employed_standard_error_percent | 5 | 1.00 | 4.36 | 3.04 | 0.70 | 2.50000e+00 | 3.50 | 5.10 | 26.10 | ▇▂▁▁▁ |
| hourly_wage_avg | 6 | 1.00 | 28.48 | 6.65 | 9.23 | 2.37000e+01 | 28.25 | 32.39 | 57.96 | ▁▇▆▁▁ |
| hourly_wage_median | 6 | 1.00 | 27.86 | 6.72 | 8.64 | 2.30800e+01 | 27.58 | 31.72 | 56.93 | ▁▇▇▁▁ |
| annual_salary_avg | 6 | 1.00 | 59248.30 | 13829.14 | 19190.00 | 4.93000e+04 | 58750.00 | 67377.50 | 120560.00 | ▁▇▆▁▁ |
| annual_salary_median | 6 | 1.00 | 57957.92 | 13978.95 | 17970.00 | 4.79950e+04 | 57375.00 | 65987.50 | 118410.00 | ▁▇▇▁▁ |
| wage_salary_standard_error_percent | 6 | 1.00 | 1.27 | 0.70 | 0.40 | 9.00000e-01 | 1.10 | 1.42 | 7.50 | ▇▁▁▁▁ |
| hourly_10th_percentile | 6 | 1.00 | 20.23 | 4.66 | 6.38 | 1.68100e+01 | 20.04 | 23.54 | 36.62 | ▁▆▇▃▁ |
| hourly_25th_percentile | 6 | 1.00 | 23.54 | 5.51 | 7.33 | 1.94700e+01 | 23.24 | 27.01 | 45.18 | ▁▇▇▂▁ |
| hourly_75th_percentile | 6 | 1.00 | 32.92 | 8.07 | 10.04 | 2.72100e+01 | 32.61 | 37.33 | 71.07 | ▁▇▅▁▁ |
| hourly_90th_percentile | 6 | 1.00 | 38.16 | 9.23 | 12.33 | 3.25100e+01 | 37.50 | 43.41 | 83.35 | ▁▇▅▁▁ |
| annual_10th_percentile | 6 | 1.00 | 42087.70 | 9694.20 | 13260.00 | 3.49575e+04 | 41670.00 | 48955.00 | 76180.00 | ▁▆▇▃▁ |
| annual_25th_percentile | 6 | 1.00 | 48968.81 | 11469.49 | 15260.00 | 4.04875e+04 | 48335.00 | 56195.00 | 93970.00 | ▁▇▇▂▁ |
| annual_75th_percentile | 6 | 1.00 | 68464.53 | 16777.63 | 20890.00 | 5.65975e+04 | 67835.00 | 77637.50 | 147830.00 | ▁▇▅▁▁ |
| annual_90th_percentile | 6 | 1.00 | 79367.01 | 19201.21 | 25650.00 | 6.76200e+04 | 78015.00 | 90290.00 | 173370.00 | ▁▇▅▁▁ |
| location_quotient | 649 | 0.48 | 1.01 | 0.19 | 0.32 | 9.00000e-01 | 1.01 | 1.13 | 1.50 | ▁▁▇▇▁ |
| total_employed_national_aggregate | 4 | 1.00 | 134075563.81 | 6133532.52 | 124143490.00 | 1.29059e+08 | 131713800.00 | 138885360.00 | 147838700.00 | ▅▇▅▃▃ |
| total_employed_healthcare_national_aggregate | 4 | 1.00 | 7268640.12 | 943177.74 | 5854360.00 | 6.22654e+06 | 7250140.00 | 8076300.00 | 8727310.00 | ▇▃▅▅▆ |
| total_employed_healthcare_state_aggregate | 2 | 1.00 | 134743.23 | 143540.40 | 110.00 | 3.34475e+04 | 87435.00 | 175292.50 | 844930.00 | ▇▂▁▁▁ |
| yearly_total_employed_state_aggregate | 0 | 1.00 | 2387208.60 | 2774288.09 | 110.00 | 5.96520e+05 | 1557110.00 | 2888682.50 | 17382400.00 | ▇▂▁▁▁ |
head(nurses)
# A tibble: 6 × 22
state year total_employed_rn employed_standar… hourly_wage_avg
<chr> <dbl> <dbl> <dbl> <dbl>
1 Alabama 2020 48850 2.9 29.0
2 Alaska 2020 6240 13 45.8
3 Arizona 2020 55520 3.7 38.6
4 Arkansas 2020 25300 4.2 30.6
5 California 2020 307060 2 58.0
6 Colorado 2020 52330 2.8 37.4
# … with 17 more variables: hourly_wage_median <dbl>,
# annual_salary_avg <dbl>, annual_salary_median <dbl>,
# wage_salary_standard_error_percent <dbl>,
# hourly_10th_percentile <dbl>, hourly_25th_percentile <dbl>,
# hourly_75th_percentile <dbl>, hourly_90th_percentile <dbl>,
# annual_10th_percentile <dbl>, annual_25th_percentile <dbl>,
# annual_75th_percentile <dbl>, annual_90th_percentile <dbl>, …
Looking at how years are divided.
nurses %>%
count(year)
# A tibble: 23 × 2
year n
<dbl> <int>
1 1998 54
2 1999 54
3 2000 54
4 2001 54
5 2002 54
6 2003 54
7 2004 54
8 2005 54
9 2006 54
10 2007 54
# … with 13 more rows
Hmmm. 54 entries per year. This includes: D.C., Virgin Islands, Puerto Rico, and Guam in addition to the 50 states.
nurses %>%
count(state)
# A tibble: 54 × 2
state n
<chr> <int>
1 Alabama 23
2 Alaska 23
3 Arizona 23
4 Arkansas 23
5 California 23
6 Colorado 23
7 Connecticut 23
8 Delaware 23
9 District of Columbia 23
10 Florida 23
# … with 44 more rows
The mean total number of nurses overall states shows an upward trend, except for a blip in 2012-2013.
nurses %>%
group_by(year) %>%
summarize(mean_employed_rn = mean(total_employed_rn, na.rm=TRUE)) %>%
ggplot() +
aes(x=year, y=mean_employed_rn) %>%
geom_line()
Let’s visualize whether hourly wages are increasing or decreasing across the dataset by making a heatmap. On the x-axis, we will visualize year, and we will visualize by state on our y-axis. We’re going to map the fill value to hourly_wage_median:
nurses %>%
mutate(state=forcats::fct_rev(state)) %>%
ggplot() +
aes(x=year, y=state, fill=hourly_wage_median) +
geom_tile()
stateLooking for trends in the nurses data, let’s try and scale each income so we can emphasize whether there were increases or decreases within each state. We’re just looking for trends here and whether the slope of these trends is the same for each state.
Note that by scaling within a state (transforming each value to a z-score), we are losing information, but we can see whether wages are steadily increasing for each of the states/territories.
In general, with some exceptions (Guam and Virgin Islands), most registered nurses saw an increase in median hourly wages from 1998 to 2020.
nurses %>%
mutate(state=forcats::fct_rev(state)) %>%
group_by(state) %>%
mutate(scaled_income = scale(hourly_wage_median)) %>%
ggplot() +
aes(x=year, y=state, fill=scaled_income) +
geom_tile(color="grey10") +
scale_fill_distiller() +
bplots::theme_avenir()
Since we looked at median hourly income, the question is whether these trends are the same or different for the 10th and 90th percentiles of registered nurses.
nurses %>%
mutate(state=forcats::fct_rev(state)) %>%
group_by(state) %>%
mutate(scaled_income = scale(hourly_10th_percentile)) %>%
ggplot() +
aes(x=year, y=state, fill=scaled_income) +
geom_tile(color="grey10") +
scale_fill_distiller() +
bplots::theme_avenir() +
theme(axis.text.x=element_text(angle=90))
For the most part, if you are in the 90th percentile of hourly wages, you have seen a leveling off of income after about 2008. After 2008, the 90th income seems pretty static and unchanging.
nurses %>%
mutate(state=forcats::fct_rev(state)) %>%
group_by(state) %>%
mutate(scaled_income = scale(hourly_90th_percentile)) %>%
ggplot() +
aes(x=year, y=state, fill=scaled_income) +
geom_tile(color="grey10") +
scale_fill_distiller() +
bplots::theme_avenir() +
ggtitle("90 percentile RNs have slower increases in income than the 10%")
One question we might ask are whether there are groupings by states in terms of the wage increases.
We can do this by pivoting the data and using the {heatmaply} package to make a matrix input suitable for heatmaply::heatmaply().
Here, we take hourly_wage_median and use it in the values of our matrix. Our rows correspond to state and our columns correspond to year.
nurse_median_frame <- nurses %>%
select(state, year, hourly_wage_median) %>%
arrange(year) %>%
tidyr::pivot_wider(names_from = year, values_from = hourly_wage_median)
nurse_median_matrix <- nurse_median_frame[,-1]
rownames(nurse_median_matrix) <- nurse_median_frame[["state"]]
nurse_median_matrix <- as.matrix(nurse_median_matrix)
head(nurse_median_matrix)
1998 1999 2000 2001 2002 2003 2004 2005 2006
Alabama 17.63 18.09 19.60 19.99 20.60 20.81 21.23 22.43 23.52
Alaska 22.37 23.02 24.90 26.13 26.45 26.47 28.69 28.54 30.41
Arizona 19.37 20.26 21.97 22.23 23.35 23.88 25.12 26.90 28.06
Arkansas 16.66 17.18 18.02 18.44 19.20 19.98 21.17 22.63 23.62
California 23.95 25.12 26.50 27.36 28.38 29.47 31.61 33.15 35.23
Colorado 19.79 20.47 21.77 22.56 23.17 23.88 25.60 26.91 28.15
2007 2008 2009 2010 2011 2012 2013 2014 2015
Alabama 24.92 25.80 26.48 26.44 26.41 26.02 26.20 26.39 26.70
Alaska 33.48 34.42 35.33 37.39 38.67 38.73 40.08 41.12 42.37
Arizona 29.17 30.59 31.78 33.11 34.42 34.24 34.14 34.00 34.38
Arkansas 24.17 24.78 25.10 25.28 25.90 26.16 26.56 26.72 26.76
California 36.77 38.93 39.86 41.03 42.51 43.88 45.34 46.38 48.27
Colorado 29.69 30.76 31.74 31.81 32.35 32.22 32.73 32.83 32.95
2016 2017 2018 2019 2020
Alabama 26.68 27.20 27.85 28.27 28.19
Alaska 41.01 41.45 42.14 43.54 45.23
Arizona 34.94 35.70 36.43 36.93 37.98
Arkansas 27.26 27.68 28.68 29.01 29.97
California 48.30 48.43 50.20 53.18 56.93
Colorado 33.05 34.27 35.03 36.10 36.78
We can now ask questions about the actual income values. We make heatmaply only look at computing a dendrogram for the rows (states) to look for clustering patterns.
Note we have to set our scale argument to none here.
stateIf we are interested in relative (scaled) values, the dendrogram is a little less interesting. Overall you can see that all states showed an increase in hourly median wage over the years.
This was a nice dataset to get back into Tidy Tuesday.
For attribution, please cite this work as
Laderas (2021, Oct. 5). Ted Laderas, PhD: Registered Nurses in the United States and Territories. Retrieved from https://laderast.github.io/articles/2021-10-05-registered-nurses/
BibTeX citation
@misc{laderas2021registered,
author = {Laderas, Ted},
title = {Ted Laderas, PhD: Registered Nurses in the United States and Territories},
url = {https://laderast.github.io/articles/2021-10-05-registered-nurses/},
year = {2021}
}